IBM opened the Granite13B LLM model for enterprise applications in May. Now, Armand Ruiz, the vice president of IBM's AI platform products, has publicly disclosed the full content of the comprehensive 6.48TB dataset used to train Granite13B.The dataset, after strict preprocessing, was reduced to 2.07TB, a reduction of 68%. Ruiz emphasizes that this step is crucial for ensuring a high-quality, unbiased, ethical, and legally compliant dataset to meet the needs of enterprise applications.
The datas